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Abstract 



The distribution of fitness effects of adaptive mutations remains poorly understood, both empirically and theoretically. 
We study this distribution using a version of Fisher's geometrical model without pleiotropy, such that each mutation 
affects only a single trait. We are motivated by the notion of an organism's chemotype, the set of biochemical reaction 
constants that govern its molecular constituents. From physical considerations, we expect the chemotype to be of high 

O^l dimension and to exhibit very little pleiotropy. Our model generically predicts striking cusps in the distribution of the 
(— I fitness effects of arising and fixed mutations. It further predicts that a single element of the chemotype should comprise 
^ all mutations at the high-fitness ends of these distributions. Using extreme value theory, we show that the two cusps 
with the highest fitnesses are typically well-separated, even when the chemotype possesses thousands of elements; this 

OO suggests a means to observe these cusps experimentally. More broadly, our work demonstrates that new insights into 
evolution can arise from the chemotype perspective, a perspective between the genotype and the phenotype. 
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1. Introduction 

Adaptive mutation is fundamental to the evolutionary 
process, and it is medically important to the emergence of 
drug resistance in microbes [1] and tumors [2]. Given the 
selective advantage of a mutation, the probability that it 
fixes in a population (i.e., rises to frequency 1) and the 
mean time to do so are well-known [3]. Comparatively 
little is known, however, about the distribution of selec- 
tive advantages among new mutations. This distribution 
can be experimentally measured by confronting genetically 
identical populations with a novel environment such a new 
food source and measuring the fitness of newly arising mu- 
tations Such measurements are difficult, because adap- 
tive mutations are rare; thus theoretical analysis can offer 
important insights [5]. 

A popular predictive framework for studying adaptive 
evolution is R. A. Fisher's geometrical model, which con- 
siders adaptation in phenotypic "trait" space [S]. Muta- 
tions are characterized by the phenotypic changes they 
induce, which correspond to moves in trait space. Fisher 
used this model to argue that evolution is primarily driven 
by the accumulation of many mutations that each have 
only a small effect 0. This argument was influential un- 
til Motoo Kimura pointed out that mutations with larger 
effects are more likely to fix, so most adaptive mutations 
that fix have intermediate effect [7j. 
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Recent studies have applied Fisher's model to a gamut 
of questions in evolutionary biology and population ge- 
netics; these include the distribution of mutation fitness 
effects near an optimum |5] , sequential adaptation [HI HO] , 
and the load of deleterious mutations carried by finite pop- 
ulations [mil^]. Of particular note, predictions from the 
model regarding epistasis compare favorably with data jl3j . 
The model predicts a roughly exponential distribution of 
fitness effects for new mutations [14,, similar to mutational 
landscape models of adaptive evolution il5j. This predic- 
tion is consistent with experiments in viruses |16j and bac- 
teria [17\ , although more recent experiments by Rokyta et 
al. point toward a truncated distribution |18j . Here we 
consider a geometrical model without pleiotropy, a model 
in which each mutation affects only a single trait. We are 
motivated by considering the phenotype at a finer scale 
than is typical. 

One can view the information specifying an organism 
through a variety of scales [12] . On the largest scale, the 
phenotype of the entire organism, a single mutation of- 
ten affects multiple traits, implying substantial pleiotropy. 
On the finest scale, the genotype, a single mutation often 
affects only one amino acid codon or one regulatory bind- 
ing site, implying no pleiotropy. Systems biology is often 
modeled at the intermediate scale of biochemical reaction 
constants; multiple codons combine to determine a single 
biochemical reaction constant and multiple constants com- 
bine to determine a single phenotypic trait. Motivated by 
this useful intermediate level of description, we introduced 
the word "chemotype" [TS] to refer to the set of biochem- 
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ical reaction constants determining the rates of molecular 
reactions in an organism. Other authors have considered 
specific biochemical reaction constants to be aspects of the 
phenotype, for example Hartl, Dykhuizen and Dean |20j . 
We find it useful to distinguish the chemotype, because it 
differs in important ways from the large-scale phenotype 
typically considered. 

The chemotype differs from the large-scale phenotype 
in both dimensionality and pleiotropy. The number of in- 
dependent high-level phenotypic traits for even a complex 
organism may be modest [211 ES] ■ The number of inde- 
pendent elements of a chemotype, however, is comparable 
to the number of an organism's genes. Each gene codes 
for a protein or RNA with its own biochemical reaction 
constants, so each gene contributes at least one element 
to the chemotype. The chemotype is additionally distin- 
guished by very low pleiotropy, the degree to which single 
mutations affect multiple traits. Recent experiments on 
mouse skeletal traits have demonstrated that this system 
possess a moderate degree of pleiotropy; a given muta- 
tion typically affects around five traits [23]. By contrast, a 
single mutation is expected to affect only one or a few ele- 
ments of the chemotype. This is because single-nucleotide 
mutations are dominant in short-term and laboratory evo- 
lution [211 US] , and they typically change only a single pro- 
tein residue or a single DNA binding site. Such a change 
will in turn impact only one or a few biochemical reac- 
tion constants, implying very low pleiotropy in chemotype 
space. 

Other authors have considered zero pleiotropy geomet- 
ric models in the study of drift load [3H1 [T^]. We focus 
here on the distributions of fitness effects of adaptive mu- 
tations that arise and that subsequently fix in a popula- 
tion. A general argument shows that these distributions 
possess sharp cusps, one for each element of the chemo- 
type. Given the high dimensionality of chemotype space, 
however, it is not obvious whether these cusps are observ- 
able. To address this question, we study a more specific 
model, in which the fitness landscape is Gaussian. For this 
model, we show using extreme value theory that the two 
cusps with the highest fitnesses are well-separated, even in 
a space with thousands of dimensions. This suggests that 
the cusps, and thus the nature of evolution in chemotype 
space, can be studied experimentally. 

2. Model 

As illustrated in Fig. [T] the state of an organism with 
TV chemotype elements can be represented as a point in 
TV-dimensional space: k = {ki, k2, ■ ■ ■ kjq). The change in 
state caused by a mutation can be described by an A'^- 
dimensional vector r; the mutant has chemotype k -\- r. 
Because mutations will typically change only one or a few 
elements of the chemotype, most pairs of mutations are 
orthogonal in this space. We thus restrict our attention to 
mutations with zero pleiotropy, which change only a single 
chemotype element at a time. Thus r ~ rfi, where r the 




Figure 1: Illustration of the model. Motived by the evolution 
of biochemical reaction constants, we consider evolution in a high- 
dimensional space with no pleiotropy (chemotype space). The cur- 
rent uniform chemotype A: of a population is a point in this space. 
The optimal chemotype is the origin of our coordinate system and lies 
at the center of the fitness contours. In the absence of pleiotropy, 
mutations change one element of fc at a time, so moves are made 
along the coordinate axes. The dashed arrow indicates an adaptive 
mutation of magnitude r in element ki . 



size of the mutation (which may be negative) and fi is a 
unit vector along the ith coordinate axis. We define each 
fi so that mutations which increase fitness have positive 
values of r. We work in the limit of strong selection and 
weak mutation, so that the population is genetically ho- 
mogenous aside from rare mutants that arise one at a time 
and either fix or are lost before the next mutation arises. 
In this limit, the state of the entire population corresponds 
to a single point k in chemotype space, and fixation of the 
mutation f moves the entire population to k + f. 

2.1. Gaussian landscape 

For analysis, we specialize to a Gaussian fitness land- 
scape in which the fitness W{k) of a population with chemo- 
type k is 



W(k) — exp 



1 



(1) 



where S is a symmetric positive definite matrix and S • k 
denotes the dot product of the matrix S and the vector k. 
Without loss of generality, the optimum fitness is set to 
one. 

It is convenient to work with the logarithmic fitness 
change Q, introduced by Waxman and Welch |27| and de- 
fined as 



Q = log 



W{k + r) /W{k) 



(2) 



Q is related to the selection coefficient s by s = e*^ — 1, 
and for mutations with small selective advantage Q ^ s. 
Adaptive mutations are those with Q > 0. For the Gaus- 
sian landscape, the log-fitness change caused by a mutation 
of size r in chemotype element i is 



Qi{r) = ~k-S-n r - - fi-S-n ■ 



(3) 



The largest possible gain in log-fitness achievable by mu- 
tating chemotype element i is denoted 9i and obtained by 
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maximizing Qi{r) with respect to r: 

2 

fc • S • fi 



2f,-S-? 



(4) 



The magnitude of the largest possible mutation in chemo- 
type element i that can be made without decreasing fitness 
is Pi-. 



k-S-i 



rS-i 



(5) 



These quantities are illustrated in Fig. [2]4.. 

Many of our results are derived for spherically sym- 
metric fitness landscapes, for which S = AI, where I is the 
identity matrix. For such a landscape, 



A|fc.f,|2 _ 



Qtot\k-f. 
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and 



Pi 



2\k-fi 



(6) 
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Here Qtot = A|fc|/2 = — logT4^(fc) is the difference in log- 
fitness between the optimum (which has a fitness of one) 
and the current chemotype. 

3. Results 

We first show generally that adaptation on a smooth 
fitness landscape without pleiotropy leads to cusps in the 
distribution of fitness effects of newly arising and fixed 
mutations, one cusp for each element of the chemotype 
(i.e., each dimension of the landscape). To assess whether 
these cusps will be observable, we specialize to Gaussian 
landscapes for which we can perform detailed calculations. 
We show that the cusps will be difficult to directly ob- 
serve in a histogram of experimental fitness measurements. 
Nevertheless, we show that the cusps do have observable 
consequences, because the two cusps with the highest fit- 
nesses are typically well-separated, even if the landscape 
has thousands of dimensions and deviates strongly from a 
sphere. 

3.1. Cusps 

Fig. [2j\ illustrates the slice Qi of the fitness landscape 
accessible by mutations to a particular chemotype element 
I. For a mutation of size r, the range of mutations Ar 
about r that produce fitness changes in a given range /S.Q 
is inversely proportional to the slope dQi/dr. By defini- 
tion, Qi is at a maximum for the fittest mutation r* and 
thus has zero slope there. This yields an infinite inverse 
and thus a cusp in the distribution of fitness effects of 
adaptive mutations to chemotype element i, as illustrated 
in Fig. [2fi. 



The above argument relies only on a lack of pleiotropy, 
a smooth fitness landscape, and a distribution of fitness 
affects which is non-singular and broad enough to access 
the optimal mutation of a given chemotype element. A 
natural question is whether these cusps will be observable 
in experiments. To address this, we analyze a more specific 
landscape for which we can perform concrete calculations. 

3.2. Gaussian landscape 

Comparisons between empirical mutation effect distri- 
butions in different environments for several organisms 
support a Gaussian form for the fitness landscape close 
to the optimum chemotype [2S] . For the remainder of this 
paper, we assume a Gaussian landscape as in Eq. [l] 

We must also specify the distribution of mutational 



effects /(r) on the chemotype. In Appendix A we use the 



fact that most mutations are deleterious [29, 30J to show 
that this distribution must typically span the full range 
of adaptive mutations for all elements of the chemotype. 
In other words, mutations must often 'hop over' the ridge 
of increased fitness. For computational ease, we take /(r) 
to be uniform over the range of adaptive mutations; our 
qualitative results are robust to this choice. 

Given these two assumptions, we can calculate the dis- 
tribution faiQ) of fitness effects for adaptive mutations, 
as detailed in [Appendix B[ 

1 



faiQ) (X 



(8) 



Here the sum is over elements of the chemotype, and each 
element contributes its own cusp. Fig. [sj'V plots faiQ) 
for a population at a particular random chemotype fc in a 
30-dimensional spherical fitness landscape (in which S is 
proportional to the identity matrix). Note that the dis- 
tribution is bound by a roughly exponential envelope. As 
detailed in |Appendix C[ if we average our distribution over 
initial states fc with a given fitness, we obtain Waxman and 
Welch's previous result ^27j for the spherical model with 
maximum pleiotropy. 

For a large population, the probability that an adaptive 
mutation fixes is proportional to its fitness effect Q ^ [31] ; 
thus the density of fitness effects of fixed mutations //(Q) 
is 

ffiQ) « QfaiQ). (9) 

This density of fixed mutations is shown in Fig. [3j3 for the 
same spherical fitness landscape and initial chemotype fc as 
in Fig.[3]A_. The cusps at large Q are much more prominent 
in the distribution of fixed mutation fitness effects. 

Distributions of fitness effects can be measured exper- 
imentally by introducing identical populations to identi- 
cal novel environments and tracking mutations that sweep 
through them. The histogram in Fig.[3}3 simulates such an 
experiment, representing 1000 samples from ffiQ), each 
polluted by Gaussian measurement noise in Q /Qtot with a 
standard deviation 0.01. Given the rarity of adaptive mu- 
tations, such an experiment would be difficult, but even 
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Figure 2: Origin of cusps. A: The log-fitness Qi(r) as a function of the mutation size r in chemotype element i. Qiir) has a maximum of 6i 
at r = r*, and it returns to zero at r = pi. B: The probability density of mutation fitness effects. It generically has a cusp at Qi{r) = 6i, 
corresponding to the point at which Qi{r) has zero slope. The density is plotted sideways to emphasize the connection between the cusp and 
the maximum of Qi{r). 
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Figure 3: Example of cusps. A: The probability distribution fa{Q) 
of the fitness effects of adaptive mutations for a 30-dimensional spher- 
ical fitness landscape with a random initial chemotype k. B: The 
probability distribution ff{Q) of fitness effects of fixed mutations, 
for the same k as in A. The cusps at large values of the fitness Q 
are much more prominent. The grey histogram shows 1000 samples 
from the distribution, each including a 1% error in the measurement 
of Q/Qtot- 



such a difficult experiment cannot directly resolve the cusps. 
We now show, however, that the spacing between the fittest 
and second-fittest cusps is typically substantial. This im- 
plies that the upper end of the fitness distribution should 
be dominated by mutations to a single element of the 
chemotype, a prediction that can be tested experimentally. 

3.2.1. Cusp spacings 

Each cusp in Fig. [3] corresponds to mutations affecting 
a different chemotype element k^. Thus our model not only 
predicts cusps, but also predicts that the most adaptive 
mutations will all affect the same element of the chemo- 
type. To experimentally test this prediction, it suffices to 
measure relative fitness differences of order A, where 



A = (01 - 02)/0i 



(10) 



is the normalized separation between the two cusps with 
the highest fitnesses. In [Appendix D| we derive the distri- 
bution of A predicted by our model for a spherical land- 
scape, using methods of extreme value theory [32] . 

The solid line in Fig. |4] is the exact asymptotic result 
(using Eq. D.4 and |D.5[ ) for the mean of A, given a spheri- 
cal fitness landscape. The dashed line is the approximation 



(A) 



1 



1 + log iV - 



(11) 



which is valid for large landscape dimension N. The circles 
in Fig. [4] are the results from numerical simulations in the 
spherical landscape at each TV. The agreement between 
the exact asymptotic result and the numerical simulations 
is excellent, and the approximate result captures the trend 
well. Note that (A) declines very slowly as a function of 
N; for a chemotype with N — 10, 000 elements the mean 
A is approximately 0.11, a relative fitness difference that 
is straightforward to measure experimentally. For com- 
parison, Fig. [3] has A « 0.27, which is approximately the 
predicted (A) when the chemotype has = 30 elements. 
Thus our model predicts that, even for a high-dimensional 
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Figure 4: Mean relative spacing A between the fittest and second- 
fittest cusps as a function of landscape dimension N. The solid line 
is the asymptotically exact result for the spherical fitness landscape, 
while the dashed line is the approximation of Eq. |11| The circles 
are numerical simulations for the spherical landscape, while the di- 
amonds and triangles are simulation results for mildly and severely 
non-spherical landscapes, respectively. The mean value of A de- 
clines very slowly with N (especially for large A'^), suggesting that 
the cusps will typically be well-separated even for chemotype spaces 
of high dimension. 

chemotype, a substantial range A of the most adaptive 
mutations wiU affect the same element of the chemotype. 

3.2.2. Non- spherical landscapes 

To assess the robustness of our results regarding cusp 
spacing, we numerically test them in non-spherical land- 
scapes. Studying non-spherical fitness landscapes also im- 
plicitly considers different mutation scales amongst chemo- 
type elements, because any differences in the typical size of 
chemotype mutation effects on different elements (anisotropic 
/(r)) can be eliminated by rescaling the chemotype ele- 
ments ki. A convenient way to characterize a landscape 
is by the eigenvalues of S. Spherical landscapes have all 
eigenvalues equal, while for non-spherical landscapes the 
width of the fitness contour along any given eigenvector of 
S is proportional to the square root of the corresponding 
eigenvalue, so landscapes with a larger range of eigenvalues 
are more non-spherical. 

For a given landscape S and initial chemotype fc, A can 
be calculated numerically from the definition of 6i. In the 
tests described below, the mean of A is calculated from 
10^ simulations, each instance involving an independent 
landscape S and initial chemotype k. The eigenvectors of 
S were random orthogonal vectors, and the initial chemo- 
types were chosen at random among those with a fixed 
fitness Qtot, as described in [Appendix E[ 

The diamonds in Fig.[4]result from mildly non-spherical 
fitness landscapes corresponding to eigenvalues of S drawn 
uniformly from the range 0.4 < A < 3.6, following Wax- 
man [33]. The deviations of (A) from the spherical case 
are small. 

The triangles in Fig. |4] arise from "sloppy" fitness land- 
scapes [311 135] with the N eigenvalues evenly spaced in the 



logarithm from 10^ to 10^^. This corresponds to the nar- 
rowest axis of the fitness contours being one-millionth the 
width of the longest axis. Even for these very non-spherical 
fitness landscapes, the average spacing (A) remains sub- 
stantial and comparable to the average in the spherical 
case. 

4. Discussion 

We analyzed adaptive mutation in a version of Fisher's 
geometric model in which mutations are restricted to act- 
ing in only one dimension at a time. This condition of 
zero pleiotropy is appropriate when the population is de- 
scribed in terms of its chemotype, the biochemical reaction 
constants of the molecules that comprise the organism, 
only one or a few of which will be altered by any given 
point mutation. We showed that the probability density 
of fitness effects of adaptive mutations will generically ex- 
hibit cusps, each associated with mutations of a particular 
chemotype element. These cusps are particularly promi- 
nent in the density of fitness effects of fixed mutations. 
Simulations suggest that directly resolving these cusps ex- 
perimentally will be difhcult. However, each cusp corre- 
sponds to a different element of the chemotype, and we 
showed that the relative spacing between the two cusps 
with the highest fitness remains substantial even in very 
non-spherical landscapes of high dimension. This suggests 
a testable prediction that is robust to details of the model: 
the fittest mutations should all affect the same element of 
the chemotype. 

It may be surprising that even very non-spherical fit- 
ness landscapes (range in eigenvalues of 10^^) yield a qual- 
itatively similar cusp distribution to the spherical land- 
scape. Our simulations assume that the eigenvectors, and 
thus the correlations between chemotype elements and fit- 
ness, are random. In this case, each chemotype element 
contributes about equally to each eigenvector, so the fit- 
ness function is similar when projected along each chemo- 
type direction, yielding a narrow distribution of 9i which 
is similar to the spherical case. The assumption of ran- 
dom correlation structure is motivated by empirical study 
of the sensitivity of biochemical networks to reaction con- 
stant variation [3S] and theoretical study of sloppy systems 
in general [S5]. In both cases, random eigenvectors are a 
reasonable approximation to the complicated correlations 
found. 

A key assumption of our model is that each chemotype 
element is continuously adjustable throughout the range 
of possible adaptive mutations. Because the genetic code 
is discrete, this cannot be strictly true. The distribution 
of effects of random mutations on chemotype elements is 
not well-known, in part because most biochemical stud- 
ies focus on mutations of large effect. However, studies 
have shown that random mutations can introduce small 
but non-zero changes to the enzymatic activity of pro- 
teins [37] and the expression driven by promoter sites [38] , 
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suggesting that our assumption of continuous chemotype 
variation is reasonable. 

The substantial average separation (A) we predict be- 
tween the fittest and second-fittest cusp implies that the 
mutations conveying the largest fitness benefits will all in- 
volve a single chemotype element. A similar result holds 
for the mutational landscape model [121 , in which the 
largest fitness spacings are between the fittest sequences. 
The spacing distributions in the mutational landscape model, 
however, depend on the correlation assumed between the 
effects of different mutations. A recent analysis of such cor- 
relations considers mutations within "blocks" of sequence 
that contribute independently and identically to fitness ^40j . 
Each block may be roughly interpreted as a different chemo- 
type element in our model, but in our model the relative 
contributions to fitness differ between blocks and naturally 
arise from the structure of the landscape. Nevertheless, 
the fact that both models predict the upper end of the fit- 
ness distribution to be dominated by few mutations, or in 
our case mutations in a few chemotype elements, may help 
explain the large amount of parallel evolution that can be 
observed in separate populations exposed to similar envi- 
ronments [111112]. 

The distribution of fitness effects of adaptive mutations 
has been studied in bacteria and viruses [HKTT]. Typically 
the distribution is found to be consistent with a smooth 
exponential distribution. Our theory predicts only gentle 
cusps in this distribution, but much more prominent cusps 
in the distribution of fitness of effects of fixed mutations. 
This distribution has been studied experimentally in bac- 
teria [m HSl [501 1 and those results are also consistent with 
a smooth distribution. We showed, however, that it would 
be difficult for such experiments to directly resolve the 
cusps. Intriguingly, a recent study of virus adaptation by 
Rokyta et al. points toward a fitness effects distribution 
with a truncated right-hand tail |18j . consistent with our 
model. 

Directly resolving the cusps is challenging; it will be 
easier to test the prediction that the fittest mutations will 
all affect a single element of the chemotype. Given that 
the average fittest cusp separation (A) is roughly 0.1 even 
for very large resolving this effect requires a relative 
precision in fitness of a few percent, which is achievable 
by averaging repeated assays. Recent developments in 
microarray-based genotyping [24 allow the sites of mu- 
tations to be cost-effectively identified. Mutations that 
reside in, for example, the same region of a protein likely 
affect the same element of the chemotype. Correlating 
fitness measurements of mutations with identification of 
which chemotype element they affect will allow direct test- 
ing of our model predictions. 

Motivated by the adaptation of the chemotype, the set 
of biochemical rate constants comprising an organism, we 
have studied a version of Fisher's geometrical model with- 
out pleiotropy. The model predicts cusps in the distri- 
bution of fitness effects of fixed mutations and that the 
fittest mutations all involve a single element of the chemo- 



type. Analysis suggests that the second prediction is ex- 
perimentally accessible. More broadly, our work suggests 
that viewing evolution in terms of the chemotype may of- 
fer new insights beyond those found at the genotype or 
phenotype level. 
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Appendix A. Scale of chemotype mutation 

If the distribution /(?■) of mutation effects on the chemo- 
type were small for r greater than the typical pi, the mu- 
tational distance over which adaptive mutations are possi- 
ble, then the fraction Pa of mutations that were adaptive 
would be approximately one-half. The rarity of adaptive 
mutations thus suggests that /(r) must be appreciable for 
r greater than the typical pi. We now show that f{r) must 
remain substantial even for r greater than the largest pi. 
To do so, we make the simplifying assumption that the 
distribution of mutational effects is identical for all chemo- 
type elements. We then consider the scenario in which this 
distribution barely covers the range of all possible adaptive 
mutations, extending only to the largest of the pi, demoted 
maxiPi. For this scenario, we derive an analytic approxi- 
mation to Pa for spherical landscapes, and we calculate Pa 
numerically for the non-spherical landscapes considered in 
Fig. |4] In both cases we find that this scenario leads to an 
unrealistically high probability of adaptive mutation, im- 
plying that the distribution of mutation chemotype effects 
must have a scale larger than that of the largest possible 
adaptive mutation. 

If the probability density of mutation chemotype ef- 
fects /(r) were uniform over (— max^ pi, -I- max^ p;), the 
probability of a random mutation being adaptive would 
be 



Pa = 



2N maxi pi 



(A.l) 



The numerator is the total length of intervals where muta- 
tions are adaptive, and the denominator is the total length 
of intervals over which mutations arc distributed. 

Specializing to the spherical case and plugging in for 
Pi, we have 



Pa 



2 max," Ik-fj 



(A.2) 



Asymptotically for large N, hfi has a Gaussian probability 
density with variance l/N. Averaging yields 



fc-f,- 



(A.3) 
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The largest absolute value of N samples drawn from a 
Gaussian density with variance 1 /N is asymptotically -i / 2 log 



Appendix D. Extreme value theory for A 



Eq. 4.2.3(11)]. Plugging these into Eg. |A.2| yields 

Pa{N) ^ 



>y^rlog(7VA/2^ 



(A.4) 



This probability remains substantial even for very large N. 
For example, Pa (10, 000) is roughly 0.1, an unrealistically 
large value. 

Numerical tests with both the mildly and wildly non- 
spherical landscapes considered in Fig. |4] yield a Pa of at 
least 0.13, consistent with the spherical result of Eq. |A.4| 
This suggests that, for a realistic fraction of mutations to 
be deleterious, the typical scale of chemotype effects for 
mutations must be larger than max^ pi, even for very non- 
spherical landscapes. 

Appendix B. Fitness effects distribution 

Given the probability density of chemotype mutation 
effects /(r), the distribution fa{Q) of fitness effects for 
adaptive mutations is 

fa{Q) « E / ^'^/('^) '^(Q - W)' (B.l) 

i 

where the sum is over all chemotype elements i, and Qi{r) 
is given by Eq. [3j Making the variable substitution u = 
Qiir) yields 



fa{Q) j du 



f{Q-\u))5{Q~u) 



(B.2) 



2 f, • S • f, u 



Further making the approximation that /(r) is uniform 
over the range of adaptive mutations, and substituting 
Eq.|4] yields Eq. [8| 

Appendix C. Average distributions 

The distribution of adaptive fitness effects averaged 
over initial chemotypes with a given fitness can be cal- 
culated by averaging fa{Q) over the probability density 
for 9i. For a spherical fitness landscape, the di are pro- 
portional to the squared magnitudes of the components of 
the unit vector k. Asymptotically as the number of di- 
mensions N (X), these are squares of Gaussian variables 
and have probability density 



/e(0.)«exp [-e,N/{2Qtot)]/^ 



(C.l) 



which is a density with one degree of freedom. For the 
spherical fitness landscape the result is: 

faAQ) « exp [~QN/{AQtot)] Kq [QN/{AQtot)] , (C.2) 

where Kq is the zero-order modified Bessel function of the 
second kind. This is identical to Waxman and Welch's 
result for the model with maximum pleiotropy |27j . 



orrfefflzed spacing between cusps A is a ratio of 
two values; to calculate its probability density we first cal- 
culate the density of ii = log^i — log the spacing be- 
tween the logarithms of the largest two 9s. Defining 



= log 



ON 

Qtot 



(D.l) 



and using the asymptotic density for 9 yields the asymp- 
totic probability density of lo: 



f{uj) = exp 



1 



( exp (u) — oj) 



/V2t: 



(D.2) 



The corresponding cumulative probability distribution F{u!) 
r^^f{i^')du:' is 



F{uj) = erf (^exp (a;/2) 



(D.3) 



where erf is the error function. This distribution has 
exponential-type extreme value statistics [32 . 

Following the terminology and notation of Gumbel [35] , 
the typical size ui^m of the largest of N samples from the 



density /(w) is given by F{ui^m) 
is 



1- 



N 



In our case this 



ui,jv = 21og (V2erf"i (1 - 1/iV)) . (D.4) 
The corresponding scale parameter ai.Ar is 

ai,iv = ^/(ui,7v), (D.5) 

and distance ii between the largest two samples has prob- 
ability density 



f{ii) = ai,N exp(-Q!i,Ar ii). 



(D.6) 



(Gumbel's result [32] for this distribution, his Eq. 5.3.5(4), 
has a2,N in place of ai^M- In the limit N oo the two 
expressions are equal, but ai.jv is a better approximation 
for small N.) 

The distance between the logarithms «i is related to A 
by A = 1 — 92/9i = 1 — exp(— ii). Thus the probability 
density for A is 



/(A)=ai,^(l-A)("^'"-^V 
and the average of A is 

(A) = 



1 



1 + "l.JV 



(D.7) 



A useful approximation for ai^N can be obtained using 
an asymptotic expansion for erf~^ |46j : 



V2eTr^ (1 



log 



log log 



(D.9) 
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Propagating this expansion through Ui^jv and ai^jv and 
neglecting terms of order log log N in the final expression 
yields 

ai^Ar«log7V + log(^V2A) • (D.IO) 

From this follows the approximate expression for (A) in 
Eq.[n] 

Appendix E. Numerical simulation 

A random set of orthogonal unit vectors Vi can be 
obtained from the eigenvectors of a matrix G from the 
Gaussian Orthogonal Ensemble: G = H + where the 
elements of H are standard normal random numbers. A 
matrix S with eigenvalues can then be constructed via 

Sj,k = ^Kv^^jVi^k- (E.l) 

i 

Random chemotypes k with specified log-fitness Qtot = 
— log W{k) are obtained using the Cholesky decomposition 
A of S~^, defined by A-A^ = S~^. k is then given by 

k = y/2QtotA-k, (E.2) 

where fc is a random unit vector. 



9 



